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Abstract 

Novel algorithms for object recognition are described that directly recover the transformations relating 
the image to its model. Unlike methods fitting the conventional framework, these new methods do not 
require exhaustive search for each feature correspondence in order to solve for the transformation. Yet 
they allow simultaneous object identification and recovery of the transformation. Given hypothesized 
corresponding regions in the model and data (2D views) — which are from planar surfaces of the 3D 
objects — these methods allow direct compututation of the parameters of the transformation by which 
the data may be generated from the model. We propose two algorithms: one based on invariants derived 
from no higher than second and third order moments of the image, the other via a combination of the 
affine properties of geometrical and differential attributes of the image. Empirical results on natural 
images demonstrate the effectiveness of the proposed algorithms. A sensitivity analysis of the algorithm is 
presented. We demonstrate in particular that the differential method is quite stable against perturbations 
— although not without some error — when compared with conventional methods. We also demonstrate 
mathematically that even a single point correspondence suffices, theoretically at least, to recover affine 
parameters via the differential method. 



Copyright © Massachusetts Institute of Technology, 1995 

This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. 
Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of 
the Department of Defense under Office of Naval Research contract N00014-91-J-4038. BKPH was also supported by NSF 
grant Smart Vision Sensors 9117724-MIP. 



1 Introduction 

Object recognition is one of the central problems in com- 
puter vision. The task of model-based object recognition 
(e.g., [4]) is to find the object model in the stored library 
that best fits the information from the given image. The 
most common methods of model based object recogni- 
tion fall into two categories from the point of view of how 
the objects are represented and how they are matched: 

The first represents objects by a set of local geometri- 
cal features — such as vertices that can be fairly stably 
obtained over different views — and matches the model 
features against the image features, typically in a exhaus- 
tive manner. In general, this type of method simultane- 
ously identifies the object and recovers the transforma- 
tion. Equivalently, it recovers the pose of the object that 
would yield an image from the object model in which the 
projected features best matches those found in the given 
image (e.g., [4, 7, 24, 11]). One such method is based on 
the 'hypothesize and test' framework. It first hypothe- 
sizes the minimum number of correspondences between 
model and image features that are necessary to compute 
the transformation e.g., [7, 24]. Then, for each hypothe- 
sized set of corresponding features, the transformation is 
computed and then used to reproject the model features 
onto the image features. The hypothesized match is then 
evaluated based on the number of projected features that 
are brought into close proximity to corresponding image 
features, and the pair of the transformation and model 
with the best match is selected. 

While this approach has achieved remarkable suc- 
cess in recognizing objects, particularly in dealing with 
the problem of occlusions of object surfaces, it still has 
practical computational problems, due to its exhaustive 
search framework. For example, even with a popular al- 
gorithm [7] for matching model objects with m features 
with image data with n features, we have to test on the 
order of m 3 n 3 combinations, where m and n are easily 
on the order of several hundreds in natural pictures. 

On the other hand, approaches in the second category 
represent objects by more global features. One method 
of this type is the moment invariant method. It combines 
different moments to represent the object, and matches 
the object model and image data in moment space[6, 19, 
1]. The chosen combinations of moments are designed 
so that they are invariant to the image transformations 
of concern, such as translations, dilation, and rotations. 
Thus, emphasis is mainly placed on the identification of 
the object in terms of the object model represented by 
the combinations of the moments, rather than on the 
recovery of the transformation between the model and 
the image data. 

In addition, most authors have not addressed the 
problem of general affine transformation case (instead 
only treating translation, dilation and scaling). An ex- 
ception is the method by Cyganski et. al.[2] based on 
tensor analysis. They developed a closed form method 
to identify a planar object in 3D space and to recover 
the affine transformation which yields the best match be- 
tween the image data and the transformed model. The 
basis of their method is the contraction operation of the 
tensors[12, 9] formed by the products of the contravari- 



ant moment tensors of the image with a covariant per- 
mutation tensor that produces unit rank tensors. Then, 
further combining those with zero-order tensors to re- 
move the weight, they derived linear equations for the 
affine parameters sought after. This method is quite ele- 
gant, but, it turns out that it needs at least moments up 
to fourth order. In general, the second type of method 
is very efficient when compared with the first type of 
method, that is, methods based on local features plus 
exhaustive search. At the same time, methods based 
on invariants tend to be very sensitive to perturbations 
in the given image data. For example, Cyganski's al- 
gorithm is known to be very efficient computationally, 
however, since higher order moments are notorious for 
their sensitivity to noise [18], it is very fragile when it 
comes to perturbations in the image data, being partic- 
ularly sensitive to local occlusions of object surfaces. 

The algorithm that we propose in this paper can be 
classified in the second category for the reason given be- 
low. It is more efficient than conventional approaches 
in the first category, yet more stable than conventional 
methods of the second category: (1) it relies on the pres- 
ence of potentially corresponding image fragments over 
different views, that are from planar patches on the sur- 
face of the 3D objects, (2) it provides a non-recursive, 
that is, closed-form, method for object recognition. The 
method does not require complete image regions to be 
visible and does not depend on the use of local features 
such as edges or 'corners.' Our method also recovers the 
transformation from the object model to the image data, 
but, unlike Cyganski's method, it does not use moments 
of order higher than second or third order. Therefore, 
compared with Cyganski's method, it should be less sen- 
sitive to perturbations. In addition, we also present an- 
other new approach to robust object recognition using 
differential properties of the image. 

Thus, we propose two different algorithms: one based 
on an affine invariant unique to the given image, which 
uses up to second or third order moments of the image, 
and the other via a combination of second order statistics 
of geometrical and differential properties of the image. 
Both algorithms recover the affine parameters relating a 
given 2D view of the object to a model composed of pla- 
nar surfaces of a 3D object under the assumption of or- 
thographic projection[20, 10]. We also demonstrate that 
such methods based on the differential properties of the 
image are fairly stable against perturbations. Of course, 
the results are not perfect in the presence of perturba- 
tions, but the new method does provide much better 
results than conventional methods using global features. 
Although we do not explicitly address the problem of 
how to extract corresponding regions for planar patches 
in different views, it is known to be fairly feasible using 
one of several existing techniques (e.g. ,[22, 21, 23, 13]). 
Once we have recovered the affine transformation for the 
planar patches, we know that by using the 3D object 
model we can immediately recover the full 3D informa- 
tion of the object [7]. Therefore, our algorithm is aimed 
at direct 3D object recognition, by first recognizing pla- 
nar surfaces on the object, and then recovering full 3D 
information, although the recovery of 3D information is 
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Figure 1: Commutative Diagram of Transformations 
Given model feature X and corresponding data feature X , 
we seek conditions on the transformations A, A such that 
this diagram commutes. 



not explicitly addressed in this paper. Some experimen- 
tal results on natural pictures demonstrate the effective- 
ness of our algorithm. We also give here an analysis of 
the sensitivity of the algorithm to perturbations in the 
given image data. 

2 Recovering affine parameters via an 
affine invariant plus rotation 
invariant using no higher than 
second/third order moments 

In this section, we present a closed form solution for re- 
covering the affine parameters with which a given image 
can be generated from the model, using an affine invari- 
ant theory that we have recently proposed. We first sum- 
marize the affine invariant description (up to rotations) 
of the image of planar surfaces. Then, using this prop- 
erty, we show how the affine parameters are recovered 
via direct computation in conjunction with the rotation 
invariant using moments of the image. 

2.1 An affine invariant up to rotations: a 
unique class of linear transformations 

In [15, 16, 17], we showed that there exists a class of 
transformations of the image of a planar surface which 
generates unique projections of it up to rotations in the 
image field. It was precisely shown that this class of 
transformations is the only class of linear transforma- 
tions which provides invariance up to rotations, as long 
as we are concerned with no higher than second order 
statistics of the image(see [15]). This property is sum- 
marized in the following theorem. 
[Theorem ] 

Let X be a model feature position and X' be the corre- 
sponding data feature position in the 2D field. We can 
relate these by 
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where L is a 2 x 2 matrix and u> is a 2D vector. Now sup- 
pose both features are subjected to similar linear trans- 
formations 
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where A,A',T are 2x2 matrices and B,B',C are 2D 
vectors. Then, if we limit T to an orthogonal matrix, a 
necessary and sufficient condition for these linear trans- 
formations to commute (i.e. to arrive at the same values 
for Y') for all X, X' (see Figure 1), as long as only up to 
second order statistics of the features are available, is 
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where $ and $' are eigenvector matrices and A and A' 
are eigenvalue matrices of the covariance matrices of X 
and X' respectively, U and U' are arbitrary orthogo- 
nal matrices, and c is an arbitrary scalar constant. The 
terms [•]" denote square root matrices [8] and [-] T means 
matrix transpose. □ 

Furthermore, it was shown[15] that when (1) repre- 
sents the motion of a plane, and both $ and $' rep- 
resent rotations/reflections simultaneously, and U and 
U' are set to some rotation matrices, then T in (4) can 
be constrained to be a rotation matrix. As another as- 
pect of this normalization process, we know that trans- 
formations A, A' defined in (5) and (6) transform the 
respective distributions to have a covariance matrix that 
is the identity matrix. Arguments were also given on 
the physical explanations of this property for the rigid 
object case. In [15, 16, 17], to recover the affine param- 
eters using this property, we used clustering technique 
to derive three potentially corresponding clusters in the 
model and data 2D features and used their centroids as 
matching features in the alignment framework. 

In this section, we present other methods to directly 
recover the affine parameters using this invariant prop- 
erty. Recall that once we have normalized the image 
using the transformations given in (5), (6), the shapes 
are unique up to rotations. Thus, if we can compute the 
rotation matrix T in (4) which relates the normalized 
data image from the normalized model we can recover 
the affine transformation L by 

L = A'~ l TA (7) 

where the translational component has been removed, 
using the centroid coincidence property[2, 15]. Note 
however, that, since this normalization process trans- 
forms the covariance matrices into identity matrices 
times a scale factor, the covariances can no longer be 
used to compute the rotation angle between the normal- 
ized model and data features. So, we need to use some 
other information to determine this rotation angle. 

2.2 Computing the rotation angle using second 
order weighted moments of the image 

Although the binary image of the model and the data 
are normalized by the matrices A, A' so that they have 
identity covariance matrices, the weighted moments of 
the image function - for instance brightness of the im- 
age - are not normalized in that sense. Therefore, we 
can compute the rotation angle between the normalized 
binary images of model and data by first using the orien- 
tation of the major axes of the image computed in terms 
of the weighted moments with respect to fixed coordi- 
nates. We then take the difference between the absolute 



orientations of the model and the data computed in this 
fashion to give the relative rotation angle. 
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where Mjj's are second order weighted moments of the 
normalized image given in the following: 

x y 

m m = 5252( x v)f( x >y) ( 10 ) 

^o,2 = £S> 2 )/(*,i/) ( n ) 

where the origins of the normalized coordinate have been 
centered at the centroid of each normalized region, 6 is 
the orientation of the normalized image, and f(x, y) is an 
image function — such as brightness — defined on the 
normalized coordinate. For the 'image' function, how- 
ever, brightness may not necessarily be the best choice. 
A desirable property of the 'image' function here is sta- 
bility under varying ambient light conditions and the 
relative orientation of the object surface with respect to 
the camera and the light source in 3D space. From the 
shape of the formula of (8) with (9) — (11) it is clear that 
the rotation angle thus recovered is never affected by 
scale change of the image function between the model 
and data views. Therefore, the property we need here 
from the image function is not a perfect constancy, but 
merely a constancy within a scale factor under different 
illumination conditions. This is not a hard requirement 
in practice because we are now focusing on the properties 
of planar surfaces. For example, if the sensor channels 
are narrow band it is known that the outputs are invari- 
ant up to a consistent scale factor over the entire sur- 
face(see e.g. [14]). By equation(8), we get two different 
candidate angles (by taking the direction of the eigen- 
vector with the larger eigenvalue). To select the correct 
one, we can align the given image data with the recon- 
structed image from the model using the recovered affine 
parameters based on (7), and pick the one that gives the 
best match. 

2.3 Rotation angle via Hu's moment 

invariants: using 3rd order moments 

If the image does not have enough texture, or if it is 
a binary image, we can not use weighted moments of 
the image to compute the rotation angle between the 
normalized image data and the model. In this case, 
however, we can use the third order moments of the bi- 
nary image. The use of higher order moments for in- 
variance to rotation was extensively discussed in pattern 
recognition(e.g.[6, 19, 1]). As a by-product of the study 
of invariance, in [6] a method for computing the rotation 
angle using higher order moments was also presented, 
which we rewrite here: 
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where N pq and A/ s are respective third order moments 
of the normalized binary image for model and data given 
in the following (shown only for the normalized model 
view) and I pq , /' are the complex moment invariants 
proposed in [6]. 

A3 n " " 



A, 



Ni, 



A ( 



x y 


(14) 


x y 


(15) 


x y 


(16) 


EE(^ 3 ) 


(17) 



0,3 



where the sums are taken over the entire region of the 
binary image in the normalized coordinate in which the 
coordinate origin has been centered at the centroid of 
the region. 
Thus, we have: 
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where, 
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An aspect to which we must pay careful attention in us- 
ing Hu's moment invariants is that of n-fold rotational 
symmetry. As argued in [6], moment combinations 7 pg 's 
with the factor e lwe , where w/n is not an integer, are 
identically zero if the shape has n-fold rotational symme- 
try, so that we can not use those moments for recovering 
the rotation angle (see [6] for detail). For example, for 
the 4-fold symmetry case such as a square, both of the 
formula given in (18) and (19) are useless, and we need 
higher than third order moments. This happens if the 
original surface shape is a rectangle when viewed from 
particular direction in 3D space (of course including the 
frontal direction). This is because if some image of the 
surface can be a rectangle, then, no matter from what- 
ever direction it is viewed, its normalized binary image 
becomes a square, and hence has 4-fold symmetry. This 
is the consequence of the normalization process we are 
using(see [15] for detail). We will see this case in the 
experiment soon. 



2.4 Results in using invariants on natural 
pictures 

We now show experimental results obtained using the 
proposed algorithm for recovering affine parameters 
based on affine invariants of natural pictures. All the pic- 
tures shown here were taken under natural light condi- 
tions. The image regions, which are from planar patches 
on the object surfaces, were extracted manually with 
some care, but some perturbations may be introduced 
by this step of the procedure. Figure 2 shows the results 
on images of a Cocoa-Box. The upper row of pictures 
show the two gray level pictures, of the same Cocoa-Box 
taken from different view points: the left view was used 
for the model, while the right was used for the data. The 
left and right figures in the middle row show the respec- 
tive normalized images up to a rotation. Indeed, we see 
that the two figures coincide if we rotate the left fig- 
ure by 180 degrees around its centroid. The left and 
right figures in the lower row are the respective recon- 
structed image data from the model view (shown in the 
upper left) by the recovered affine transformation using, 
lower left: affine invariant plus second order weighted 
moments of the gray level, lower right: third order mo- 
ments of the binary image for computing the rotation 
angle. If the method works correctly, then those recon- 
structed images should coincide with the corresponding 
image portion found in the upper right figure. Indeed, 
we see that both of the methods worked very well for 
recovering the transformation parameters. 

In Figure 3 the results are shown for pictures of a 
Baby- Wipe container. The upper row of pictures shows 
the source gray level pictures of a Baby- Wipe container 
of which the front part was used for the experiment: the 
left view was used for the model, the right view was used 
for the data. The left and right figures in the middle row 
show the respective normalized images. The lower fig- 
ure is the reconstructed image data from the model view 
using the affine transformation recovered by means of 
affine invariant plus second order weighted moments for 
computing the rotation angle. We would expect that the 
reconstructed image coincides well with the image in the 
upper right. From the figure, we see that this method, 
i.e., affine invariant plus second order weighted moments 
worked very well for recovering the parameters. As ob- 
served in the figures, the normalized images are almost 
4-fold rotationally symmetric, so that — as described 
previously — we can not use the third order moments 
of the normalized binary image to recover the rotation 
angle. 

Figure 4 shows the results on some Tea-Box pictures. 
The upper row shows the pictures of a Tea-Box: the left 
view was used for the model, while the right view was 
used for the data. The left and right figures in the middle 
row are the respective normalized images up to a rota- 
tion. The left and right figures in the lower row show the 
respective reconstructed image data from the model view 
using the recovered affine transformation based on affine 
invariant plus second order weighted moments of the 
gray level (left) and third order moments of the binary 
image (right) for recovering the rotation angle. From the 
figure, we see that both of the reconstructed images coin- 



cide well with the original data shown in the upper right. 
Though both the methods worked fairly well, the method 
using second order weighted moments performed slightly 
better. Considering that both of the reconstructed im- 
ages are tilted a little bit in a similar manner, perhaps 
some errors were introduced in the manual region ex- 
traction. 

3 A sensitivity analysis in the use of 
affine plus rotation invariant 

In this section we analyze the sensitivity of the pro- 
posed algorithm for recovering affine transformations us- 
ing affine invariant plus second order weighted moments 
of the image function to perturbations in the image data. 
Perturbations are caused, for example, by errors in re- 
gion extractings, by lack of planarity of the object sur- 
face, or by occlusions. From (7), we know that the sensi- 
tivity of the recovered affine parameters against pertur- 
bations solely depends on the stability of A', the matrix 
normalizing the given binary image, and T, the rota- 
tion matrix relating the normalized model and the data 
views, as we assume that the model, so that A, does not 
include any perturbations. As described in (2.1), the 
transformation A' can be computed solely using eigen- 
values and eigenvectors of the covariance matrix of the 
original binary image, i.e., the set of (x,y) coordinates 
contained in the image region. Therefore, if the given 
image contains perturbations, these have effects on the 
matrix A', but only through the covariances. In other 
words, the errors in A' can be completely described by 
the perturbations expressed in terms of covariances. On 
the other hand, the effect of the perturbations on the 
recovered rotation matrix differs according to which al- 
gorithm we take for computing rotation, namely, the 
weighted moments of the image attributes, or the third 
order moments of the binary image of the objects. In 
this section, we only show the case for second order 
weighted moments of the image attributes. The per- 
turbation analysis of the algorithm based on third order 
moment may be presented in a subsequent paper. 

3.1 Analytical formula for sensitivity 

In the following, we derive the sensitivity formulas for 
the affine parameters to be recovered, given perturba- 
tions in the image data with respect to the model. Let 
the ideal description(without any errors) for the normal- 
ization process be presented as: 

A'LA~ l = f (24) 

and the affine parameters are recovered by(c.f.(7)): 

L = A rl TA (25) 

Throughout the subsequent parts of the paper, we con- 
sistently use the notation [ ~ ] (tilde) for ideal parameter 
values and one without tilde for actually observed val- 
ues, unless otherwise stated. Then, the perturbations 
AL happening on L is given as follows: 
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{!' \l-AA'A' 1 )~ 1 (f- AT) -A' 1 f}A 

CO 

[A 1 ' 1 {Y^i^A 1 A rl f}(f - AT) - A rl f]A 

k=0 



~\aa'a> 1 f- 



A' '{AAA 1 'T- AT)A + 0(A 2 ) (26) 

where — A A' and —AT are respective perturbations of 
A' and T such that -AA' = A' - A 1 , -AT = T - f. 
The minus signs for the perturbations are for consistency 
with the perturbation of the covariances which will ap- 
pear soon. Thus, ignoring the higher than first order 
terms, we now know that our job is to derive formulas 
for AT and AA' in terms of perturbations contained in 
the image data. 

[Perturbations in A'] 

As observed in (6), A' is a function of eigenvalues A r 's 

and eigenvectors $ r 's of the covariance matrix £' such 

_ i_ 
that j4'-(A,$) = A j- 2 $ji where A r is the rth eigenvalue 

and $ sr is the sth component of the corresponding rth 

eigenvector $ r . Let A'ij be the ideal value for A'-, the 

ij component of the matrix A' . Then, we get a formula 

for the perturbations AA'- from the Taylor expansion of 

A' in terms of A and $ as follows: 
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where perturbations of the eigen properties are defined 

as — AA 8 - = A 8 - — A 8 ', 

-A$i = $,• - $;. 

Here, from perturbation theory[S\, we have: 
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where (k, I) E {(1, 2), (2, 1)} and -AS' is the perturba- 
tion of the given covariances such that —AS' = £' — £'. 
The minus sign of the perturbation of covariances ac- 
counts for the occlusions (being occluded by some other 
surface) occurring in the given image data. Substituting 
(28) into (27), we obtain: 
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The equations (30) — (33) give the first order approxi- 
mation of the perturbation AA[ ■ for A'-, that is a lin- 
ear combination of the perturbation AT,' pq such that 
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composed of the eigen properties of the covariances ma- 
trix of the ideal image data, that are uniquely determined 
by (30) — (33) and are independent of the perturbations. 



[Perturbations in T: the rotation matrix] 

In deriving an analytical formula for the perturbation 
AT, we rely on the formula given in (8) — (11), relat- 
ing the rotation angle to the second order weighted mo- 
ments of the image (as we have fixed the orientation of 
the model, orientation of the given image can be seen to 
be equivalent to the rotation angle). Further, we have 
the following relation between the weighted moments of 
the original and the normalized images. 

(34) 

(35) 

[Am'])(I' - AA') T (36) 

where [mf], [M'] are respective symmetric matrices of 
original and transformed weighted moments defined in 
the following and the term [Am'] is the matrix of the per- 
turbation contained in the original image data in terms 
of the weighted moments: 
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Let 9, 9 be the recovered and ideal rotation angle and 
— A9 be the corresponding perturbation, where we as- 
sume that A9 is small enough such that: 
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In the following, we derive a formula for perturbation 
A9 in using second weighted moments of the image. As 
we assumed that — A9 = 9 — 9 is small enough, we can 
approximate it as: 

-A0 = -(29-29) 
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Substituting the relation presented in (8), we get: 
1 



(41) 



-A0 



1 



{tan(20)} 2 



Mil 



'MU-MU Ml >n -M' n 



1 



2,0 1M 0,2 
1 



1 + {tan(20)} 2 (M' 2 - M> 2 ) 2 

J 



-) (42) 
J (43) 
(44) 



(M,o ~ K,2) 2 + mil) 2 

where 

J = M'^M^-M^-M^M^-M^) (45) 
Substituting (37) — (39) into (45) we get: 
J = e 11 AA' ll + e 12 AA l l2 + e 21 AA l 2l + e 22 AA' 22 

+/ 20 Am' 20 + /nAm' n + / 02 Am' 02 + 0(A 2 ) (46) 

where e^'s, / P3 's are respective coefficients of AA[ • and 

Am' that are composed of the components A'- and 

m' rs that are independent of the perturbations involved 

in the given image data. 

Then, combining (40), (44), and (46), we get AT. 

[Perturbation in L] 

Finally, combining the formulas for AT thus derived with 
(30) — (33) and substituting it into (26), we obtain the 
perturbation AL: 



-AL,- 



L^ 



(CiAS; s +4iAm; s ) (47) 



where £,k are coefficients that are exclusively com- 
posed of the components A'-, m' rs and A t \j , that 
are independent of the perturbations, and (r, s) £ 
{(2, 0), (1, 1), (0, 2)}. By this linear combination of AZ' rs 
and Am' rs , we have obtained the first order approxima- 
tion of ALij , the perturbation of Lij , given the pertur- 
bations in the original image data in terms of the sec- 
ond order moments of the binary image (AT,' rs ), and the 
second order weighted moments of the image attributes 
(Am^). 

3.2 Experiments 

Now we show the sensitivity of the proposed algorithm 
for recovering affine parameters based on affine invariant 
plus second order weighted moments to perturbations of 
the given image region. From (47), we know that per- 
turbation of each recovered component Lij is the linear 
combination of perturbations of moments of the given 
image. Here, for simplicity, we try to capture the overall 



trend of the sensitivity of L to perturbations in the given 
data by examining the following formulas: 



\ I2iJ L ij 



against: 



E, J {f(A^) 2 + (Amj^} 

\J E 8J {/ 2 (4) 2 + K) 2 } 



(48) 



(49) 



where / is a balancing parameter, and in the following 
experiments we set it to 255/2. The terms 6, a 2 express 
respectively the normalized errors of the recovered affine 
parameters and the normalized perturbations in terms 
of the moments of the image. We expect that those two 
formulas show monotonic relations when perturbations 
in the moments are small. Of course, we know from the 
above arguments that there will be some complicated 
interactions between the two, but we hope some insight 
may be obtained by observing those two formulas. We 
use the same picture of a Cocoa-Box used in the earlier 
experiments. To study the effects of occlusion, pertur- 
bations in the image data were produced by dropping 
particular connected regions from the (almost) perfect 
image data, as given in Figures 5. The upper pictures 
show examples of the perturbed image data for which 
some percentage of the image region was dropped: left 
5%, middle 15%, right 25%. The lower pictures show 
the respective reconstructed image data. Figure 6 shows 
6 (vertical axis) versus a 2 (horizontal axis), in which 
the perturbations were taken from 2.5% to 25% by 2.5% 
step. From the figure, we see that 6, accuracy in recov- 
ering affine parameters, is almost proportional to a 2 , the 
perturbations, when it is small, but the slope increases 



lot 



as a increases. 



4 Using differential properties of the 
image: without invariants 

In this section, we derive another constraint equation on 
affine parameters based on the differential properties of 
the image, and combine it with the canonical geometrical 
constraint given in (1) to recover the affine transforma- 
tion. We rewrite here the geometric constraint on the 
motion of planar surfaces for convenience: 



X' 



LX 



(50) 



where the translational component has been eliminated 
(based on the invariance of the region centroids). Deriv- 
ing the covariance matrices on both sides of (50) gives: 

£ x < = LY> X L T . (51) 

where indices of the covariances Ej' , Sj show the corre- 
sponding distributions. Due to the symmetry of covari- 
ance matrix, we have only three independent equations 
in (51) for four unknowns that are the components of 
L. Therefore, we apparently need another constraints 
to solve for L. (comments: The constraint of ratio of 
the image area det[L] = AREA(X')/AREA(X) is re- 
dundant here when one employs (51).) From this point 



of view, what we have done in the preceding sections can 
be seen as imposing constraints of the rotations between 
the normalized (up to rotations) images either in terms 
of weighted moments of some image attribute or using 
third order moments of the binary image. Here, we will 
seek another constraint which does not use invariants, 
based on the differential property of the image which is 
related to the underlying geometry of the image. 

4.1 Deriving another constraint based on 
differential properties of the image 

To derive another constraint on affine parameters, sup- 
pose that we have an image attribute E(X) — some 
scalar function of position X in the image field — that 
is related to E'(X') of the corresponding point X' in 
another view by: 



1 



E(X) = -E'(X') 



P 



(52) 



where X and X' are related by (50) and p is a scalar 
constant. This represents a constraint that the changes 
of the function E between the different views are only 
within a scale factor that is consistent over the specified 
region. Again, we can claim, as in the previous discus- 
sion of 2.2, that this constraint is a fairly reasonable one. 
Taking the gradient of both sides of (52), 



(E x ,E y ) 



J (K, E 'y) 



(53) 



where E s 's denote partial derivatives of E in terms of 
the variable s, and J is the Jacobian of X' in terms of 
X such that 



dx' dx' 

dx dy 

dy' dy' 

dx dy 



L 



(54) 
(55) 



we get a similar constraint to that on the geometry given 
in (50), in the differential image, that includes the same 
affine parameters L: 



U = -L T U' 
P 



(56) 



(E' x ,E' y ) T . Taking the 



where U = (E x ,E y f and U' 

covariances brings another constraint on affine parame 
ters in terms of the second order statistics of the differ 
ential image as follows: 



1 T 
P 2 



(57) 



Thus, we have obtained two constraint equations in 
(51), (57) on affine parameters which are composed of 
up to second order statistics of the geometry and the 
differential properties of the image. 

4.2 Solving for the matrix L 

We show how we can solve for the affine transformation 
L, combining the constraints of the geometry and the dif- 
ferential properties of the image. We anticipate that in 



practice, due to the limited dynamic range of the sensor 
device as well as its spatial resolution, the geometrical 
constraint would probably be more reliable than the dif- 
ferential constraints. Therefore, we incorporate all the 
three geometrical equations given in (51) with one of the 
three differential constraints given in (57) to get a solu- 
tion for L. But, for the purpose of stability, we will try 
all the possible combinations of the set of the three from 
(51) with every one of (57), and choose the best-fit match 
in terms of the alignment of the model with the image 
data, just as in the case of using the affine invariant. 
Combining (51) and (57) we immediately get: 



_ det[Y,x>]det[Y,u>] 

P ~\j detpxtfetpu] ( ' 

Since covariance matrices are positive definite and sym- 
metric, it is not hard to see from equation (51) that L 
can be written as: 

J L = s|,QS x = (59) 

where £ x , , £ x , are respective positive definite symmet- 
ric square root matrices of Ej' , Ex, that are unique[8], 
and Q is an orthogonal matrix, accounting for the re- 
maining one degree of freedom. Considering the fact 

i i_ 

that < det[L] = det\T,'^,]det[Q]det\T,x 2 ] we know that 
Q must be a rotation matrix, so that Q may be written 
as: 



Q 



cos f 
sin 6 



— sin ( 

cos 9 



(60) 



thus we have: 
en cos 



L 



(61) 



+ /n sin 9 ei2 cos 9 + /12 sin t 
e 2 i cos 9 + /21 sin 9 e 2 2 cos 9 + / 22 sin t 

where the coefficients e 8 j , fy are composed of the ele- 

i 1 

ments of S x , and Ex 2 and those are uniquely deter- 
mined by (59). Substituting (61) into each of the two 
equations (we have already used one for solving for p) in 
(57) yields: 

ki j (cos &) + 2l{j (cos 9)(sin 9) + ni{j (sin 9) = p ptj (62) 
where kij,lij,rriij are the respective coefficient of 
(cos^) 2 , (cos 9) (sin 9), and (cos9) 2 in the ij components 
of the resulting matrices in the left hand side, that are 
composed of coefficient e pg 's,/ rs 's, and elements of Eyi, 
and pij in the right hand side. Solving for equation (62) 
we get: 



cos 9 




(63) 
(64) 



where, 



2/ + (m — p p)(m — k) 



±y / AP(P - (p 2 p - m)(p 2 p - k)) (65) 

(m-kf + 4l 2 (66) 
2/ 2 + (k- p 2 p)(k - m) 

±y / 4P(l 2 - (p 2 p - k)(p 2 p - m)) (67) 

(k-mf + 41 2 (68) 



where indices have been suppressed for simplicity. By 
substituting this back into (61), we finally obtain the 
four possible candidate of L. To select the best one out 
of this candidate set, we will try out all the candidates 
using the alignment approach and pick the one that fits 
best. 

The advantage of using gradient distributions of the 
image functions, compared with using only geometri- 
cal properties, is that their covariances may not be as 
strongly disturbed by local missing regions or occlusions. 
Actually, we show below a demonstration of this using 
experiments on natural images. In this section we de- 
scribed a method that combines differential and geomet- 
rical properties of the image, but we might be able to 
derive a different method for recovering the affine pa- 
rameters if we had more than one reliable image at- 
tributes. By combining those two image constraints, 
instead of incorporating geometry, we may be able to 
evelop a method that would be less affected by missing 
regions. 

Since the major disadvantages of the use of global 
features such as moments is the apparent senisitivity to 
local disturbances, this approach — that is, the use of 
differential properties — could be a key issue for improv- 
ing the stability of the algorithms. In the Appendix we 
also show — at least mathematically — that even a sin- 
gle point correspondence between the model and data 
2D views suffice to recover affine parameters, if some in- 
variant image function is available under the change of 
orientation of the surface. 

[Summary] 

In this section so far, we have mathematically derived 
a constraint equation on affine parameters based on the 
differential properties of the image in terms of its second 
order statistics. Then, combining this constraint with 
the canonical geometric constraint — again in terms of 
second order statistics — we shown how we can solve for 
the affine parameters by a direct computation. 

4.3 Results using differential properties on 
natural pictures 

Results using the algorithm via combination of the ge- 
ometrical and differential properties of the image are 
shown on the same natural pictures used in the earlier 
experiments for the method based on affine invariants. 
We used the gradient of the gray level (brightness) im- 
age function for the differential data. Note that even 
though the picture given in the following shows only the 
data for the manually extracted region used for recogni- 
tion, we actually use the original image when calculating 
the gradient at each point. As a result, the artificially 
introduced edges of the extracted region do not have 
any effect on the derivation of the gradient distribution. 
Note that this is very important in demonstrating the ef- 
fectiveness of our method, because otherwise larger con- 
tributions on the covariances of gradient distributions 
would be made by the artificially constructed edges. 

Figure 7 shows the results on the Cocoa-Box pictures. 
The left and right figures in the upper row show the re- 
spective gradient distribution — the horizontal axis is f x 



and the vertical axis is f y — for the model and the data 
views. The lower figure shows the reconstructed image 
data from the model view by the affine transformation 
that was recovered. We expect this figure to coincide 
with the corresponding portion of the upper right picture 
in Figure 2. From the figure, we see that the algorithm 
performed almost perfectly. 

In Figure 8 the results on the Baby- Wipe container 
pictures are given. The left and right figures in the up- 
per row show the respective gradient distribution for the 
model and the data view. The lower figure is the recon- 
structed image data. We expect this to coincide with the 
corresponding portion of the upper right picture of the 
Figure 3. The accuracy is again fairly good, although 
not as good as that obtained by affine invariant plus sec- 
ond order weighted moments. Likewise, Figure 9 shows 
the results on the Tea-Box pictures. The result is almost 
as good as that obtained using affine invariant. 

In Figure 10, we show the reconstructed image data 
given the perturbation in the original image. We used 
the same data as that used in the sensitivity tests for the 
affine invariant method. The figures show the respective 
results for the fraction of missing region 5%(left), 15%, 
25%. In Figure 11, the values of 6 (vertical axis), accu- 
racy in recovering affine parameters, are plotted against 
the percentage of the missing region (horizontal axis) in 
the given image data. We compared this results with 
the one obtained by the affine invariant method pre- 
sented previously. Apparently, the results by differential 
method (plotted as blocks) are less sensitive to pertur- 
bations than those by obtained by the affine invariant 
method (plotted as stars). Probably, this is due to the 
use of differential distribution as described previously. 

5 Conclusion 

In this paper, we proposed new algorithms for 3D ob- 
ject recognition that provide closed-form solutions for 
recovering the transformations relating the model to the 
image. We proposed two different algorithms: The first 
one is based on the affine plus rotation invariants using 
no higher than second or third order moments of the im- 
age. Some results on natural pictures demonstrated the 
effectiveness of the proposed algorithm. An error analy- 
sis was also given to study the sensitivity of the algorithm 
to perturbations. The second algorithm used differential 
properties of the image attribute. Results demonstrated 
that the use of differential properties of image attributes 
allows a recovery of the parameters that is insensitive 
to missing regions in the given image. This suggested a 
new direction of object recognition in the sense that it 
may provide a robust technique using global features for 
recovering transformations relating the model to the im- 
age. Differential properties have been extensively used 
in motion analysis(e.g.,[5]), but limited to infinitesimal 
motions of the object. In contrast to the case of motion 
analysis, our case is not limited to infinitesimal motion. 
The new method can deal with any motion of the planar 
surface, as long as the change of the image attribute is 
constrained within a scale factor at each position on the 
object. Though all the demonstrations were only on pla- 
nar patches, as we described, it can connect with the full 



3D model of the object to recover the full 3D information 
via direct computation. 
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Appendix: Recovering affine parameters 
via single point correspondence 

In this appendix we give theoretical arguments showing 
that even a single point correspondence between two dif- 
ferent views suffices to recover the affine parameters by 
using differential properties of the image. To do this we 
assume that we have a nice image attribute (function) I 
which has the perfect invariant property between differ- 
ent views such that: I(X) = I'(X') and I £ C 2 where 
X' = EX. 

(Comments: This complete invariance assumption may 
seem to be unrealistic in practice. But, again, as argued 
in [14] when the ambient light is not changed it is known 
that the ratios of the sensor outputs of different channels 
are invariant if the sensors are narrow band.) 
Taking the gradient of I we have: 



Ix — Li\I x i + L-jiIyily — L\2l x i + L-j'jIyi 
Deriving the second order derivatives, we have: 



(69) 



Ix 



Lul'x'x' + 2LuL2iI' x 'y' + L 21 I' , , (70) 



Ixy — LnLi2l x i x i + (L11L22 + Li2L,2l)I x i y i 

+ L2lL22ly'yl (71) 

Iyy = L 12 I X , X , + 2L 12 L22l x ' y ' + L 22 Iyiyi (72) 

From (69) we get L21 = (I x — Lnl x i) / I y , and substitut- 
ing this to (70) and rearranging we have 

(Ix'x'ly' ~ 2I X >Iy>I X y + Iyly'I X l ) <L U ~ 2(1 ' y 1 1 \. 1 y 1 + 1 \. 1 1 ' y 1 y ,) I x L\\ + 1 x 1 y 1 y 1 ~ 1 y 1 7^ = 0(73) 

Likewise, we have a similar equation for 7^2(and L12). 
Then, solving for these quadratic equations we obtain: 

L = i-^j'^'y' + l 'x' l 'y'y') l x ± l 7 y'l^i (74) 

11 T T 2 - 9 T T T 4- T T 2 

1 x'x' 1 y' Z ' 1 x' 1 y' 1 xy T 1 y'y' 1 x' 

(-i'x>i'x>y> + I'yi'^h t (r x ,/r s ,)\r y '\ni ^ 

1^21 — -, -, I'O) 

V V - 2 V V V 4- V V 

1 x'x' 1 y' Z1 x' 1 y' 1 xy T 1 y'y' 1 x' 

T (-{'y't'x'y' + l 'x ll 'y'y') l y T i l 'y' I l 'xJ ) \ l 'xJ 1^2 
-^21 — ', ', ((D) 

1 y'y' 1 x' Z1 y' 1 x' 1 xy ^ 1 x'x' 1 y' 

T (-I'x'I'x'y' + Iy'I'x'x')Iy ^ 14' 1^2 

V V 2 — 9 V V V -I- V V 

1 y'y' 1 x' Z1 y' 1 x' 1 xy ^ 1 x'x' 1 y' 

where 

1 = \/Ix\Ix'y' ~ Ix'x'Iy'y') + ^x(l x i x il y i ~ 2 7 x , I y , I x , y , + I y , y , I x , ) 



fi, 



\/Iy\Ix'y' ly'y'Ix'x') + Iyy(Iy'y'Ix' 2I yl I x , I x , y , + I x , x , I y , ) 
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Figure 2: Results by affine invariant method on the Cocoa-Box pictures. 
The upper row of pictures shows two gray level pictures of the same Cocoa-Box taken from two different view points: the left 
view was used for the model, the right view for the data. The left and right figures in the middle row show the corresponding 
normalized images. Indeed, we see that the two figures in this row coincide if we rotate the left one by 180 degrees around its 
centroid. The left and right figures in the lower row are the respective reconstructed image data from the model view (shown 
in the upper left) by the recovered affine transformation using, lower left: affine invariant plus second order weighted 
moments of the gray level, lower right: third order moments of the binary image for computing the rotation angle. If the 
method works correctly, then those reconstructed images should coincide with the corresponding image portion found in the 
upper right figure. Indeed, we see that both of the methods worked very well for recovering the transformation parameters. 
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Figure 3: Results by affine invariant method on the Baby- Wipe pictures. 
The upper row of pictures shows two gray level pictures of a Baby- Wipe container of which the front part was used for the 
experiment: the left view was used for the model, while the right view was used for the data. The left and right figures in 
the middle row show the respective normalized images. Indeed, we see that the two figures coincide if we rotate the left 
figure by about 180 degrees around its centroid. The bottom figure is the reconstructed image data from the model view by 
the recovered affine transformation using affine invariant plus second order weighted moments for computing the rotation 
angle. We expect that the reconstructed image coincides well with the image in the upper right. 
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Figure 4: Results by affine invariant method on the Tea-Box pictures. 
The upper row shows the pictures of a Tea-Box: the left view used for the model, while the right voew was used for the data. 
The left and right figures in the middle row are the respective normalized images up to a rotation. The left and right figures 
in the lower row show the respective reconstructed image data from the model view using the recovered affine transformation 
based on affine invariant plus second order weighted moments of the gray level (left) and third order moments of the binary 
image (right) for recovering the rotation angle. From the figure, we see that both of the reconstructed images coincide well 
with the original data shown in the upper right. Though both the methods worked fairly well, the method using second 
order weighted moments performed slightly better. Considering that both of the reconstructed images are tilted a little bit 
in a similar manner, perhaps some errors were introduced in the manual region extraction. 
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Figure 5: Sensitivity analysis against perturbations in the given image. 
The upper pictures show examples of the perturbed image data for which some percentage of the image region was dropped: 
left 5%, middle 15%, right 25%. The lower pictures show the respective reconstructed image data. The perturbations in the 
image data were produced by dropping particular connected regions from the (almost) perfect image data. 
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Sensitivity of affine invariant method against perturbations in moments 




Figure 6: Sensitivity of the recovered parameters by affine plus rotation invariants against perturbations. 
The horizontal axis is a while the vertical axis is 8. The values of 8, accuracy in recovering affine parameters, is almost 
proportional to a , the perturbations, when it is small, but the slope increases rapidly as a elevates. 
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Figure 7: Results by combination of geometric and differential properties on the Cocoa-Box pictures. 
The left and right figures in the upper row show the respective gradient distribution — the horizontal axis is f x and the 
vertical axis is f y — for the model and the data views. The lower figure shows the reconstructed image data from the model 
view by the afflne transformation that was recovered. We expect this figure to coincide with the corresponding portion of the 
upper right picture in Figure 2. From the figure, we see that the algorithm performed almost perfectly. 
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Figure 8: Results by combination of geometric and differential properties on the Baby- Wipe pictures. 
The left and right figures in the upper row show the respective gradient distribution for the model and the data view. The 
lower figure is the reconstructed image data, that we expect to coincide with the corresponding portion of the upper right 
picture of the Figure 3. The accuracy is again fairly good, though not as good as that obtained by affine invariant plus 
second order weighted moments. 
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Figure 9: Results by combination of geometric and differential properties on the Tea-Box pictures. 
The left and right figures in the upper row show the respective gradient distribution for the model and the data view. The 
lower figure is the reconstructed image data, that we expect to coincide with corresponding portion of the upper right 
picture of Figure 4. The result is almost as good as the one by using afflne invariant. 




Figure 10: Sensitivity of differential method against perturbations. 
The figure shows the reconstructed image data for the same perturbed images as those used in the sensitivity tests for afflne 
invariant method. The pictures show respective results for the perturbation percentage in the given image: left 5%, middle 
15%, right 25%. 
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Sensitivity of the algorithms against perturbations: Errors vs. Missing Region 




Figure 11: Sensitivity of the recovered parameters by differential method against perturbations. 
The values of 8 (vertical axis), accuracy in recovering affine parameters, are plotted against the percentage of the 
perturbation in the given image data (horizontal axis). The results by affine invariant are plotted using blocks, while those 
by differential method are plotted using stars. Apparently, the results by differential method are less sensitive to 
perturbations than those by affine invariant method. 
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